Questions: 1.What sectors access most funds as startups in India?
2.How has funding of startups improved overtime?
3.Does the location of the startup influence the funding it receives?
4.Who are the highest investors in the various sectors?The purpose of this question is to help
my team identify which investors to poach when we decide on what we want to venture into.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpathes
import seaborn as sns
import warnings
import plotly.graph_objects as go
warnings.filterwarnings('ignore')
##importing data
Data_2018 = pd.read_csv("/Users/emmanythedon/Documents/India StartupFunding/startup_funding2018.csv")
Data_2019 = pd.read_csv("/Users/emmanythedon/Documents/India StartupFunding/startup_funding2019.csv")
Data_2020 = pd.read_csv("/Users/emmanythedon/Documents/India StartupFunding/startup_funding2020.csv")
Data_2021 = pd.read_csv("/Users/emmanythedon/Documents/India StartupFunding/startup_funding2021.csv")
Data_2021.head()
| Company/Brand | Founded | HeadQuarter | Sector | What it does | Founders | Investor | Amount($) | Stage | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Unbox Robotics | 2019.0 | Bangalore | AI startup | Unbox Robotics builds on-demand AI-driven ware... | Pramod Ghadge, Shahid Memon | BEENEXT, Entrepreneur First | $1,200,000 | Pre-series A |
| 1 | upGrad | 2015.0 | Mumbai | EdTech | UpGrad is an online higher education platform. | Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,... | Unilazer Ventures, IIFL Asset Management | $120,000,000 | NaN |
| 2 | Lead School | 2012.0 | Mumbai | EdTech | LEAD School offers technology based school tra... | Smita Deorah, Sumeet Mehta | GSV Ventures, Westbridge Capital | $30,000,000 | Series D |
| 3 | Bizongo | 2015.0 | Mumbai | B2B E-commerce | Bizongo is a business-to-business online marke... | Aniket Deb, Ankit Tomar, Sachin Agrawal | CDC Group, IDG Capital | $51,000,000 | Series C |
| 4 | FypMoney | 2021.0 | Gurugram | FinTech | FypMoney is Digital NEO Bank for Teenagers, em... | Kapil Banwari | Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal | $2,000,000 | Seed |
Data_2020.head()
| Company/Brand | Founded | HeadQuarter | Sector | What it does | Founders | Investor | Amount($) | Stage | Unnamed: 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aqgromalin | 2019 | Chennai | AgriTech | Cultivating Ideas for Profit | Prasanna Manogaran, Bharani C L | Angel investors | $200,000 | NaN | NaN |
| 1 | Krayonnz | 2019 | Bangalore | EdTech | An academy-guardian-scholar centric ecosystem ... | Saurabh Dixit, Gurudutt Upadhyay | GSF Accelerator | $100,000 | Pre-seed | NaN |
| 2 | PadCare Labs | 2018 | Pune | Hygiene management | Converting bio-hazardous waste to harmless waste | Ajinkya Dhariya | Venture Center | Undisclosed | Pre-seed | NaN |
| 3 | NCOME | 2020 | New Delhi | Escrow | Escrow-as-a-service platform | Ritesh Tiwari | Venture Catalysts, PointOne Capital | $400,000 | NaN | NaN |
| 4 | Gramophone | 2016 | Indore | AgriTech | Gramophone is an AgTech platform enabling acce... | Ashish Rajan Singh, Harshit Gupta, Nishant Mah... | Siana Capital Management, Info Edge | $340,000 | NaN | NaN |
Data_2019.head()
| Company/Brand | Founded | HeadQuarter | Sector | What it does | Founders | Investor | Amount($) | Stage | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Bombay Shaving | NaN | NaN | Ecommerce | Provides a range of male grooming products | Shantanu Deshpande | Sixth Sense Ventures | $6,300,000 | NaN |
| 1 | Ruangguru | 2014.0 | Mumbai | Edtech | A learning platform that provides topic-based ... | Adamas Belva Syah Devara, Iman Usman. | General Atlantic | $150,000,000 | Series C |
| 2 | Eduisfun | NaN | Mumbai | Edtech | It aims to make learning fun via games. | Jatin Solanki | Deepak Parekh, Amitabh Bachchan, Piyush Pandey | $28,000,000 | Fresh funding |
| 3 | HomeLane | 2014.0 | Chennai | Interior design | Provides interior designing solutions | Srikanth Iyer, Rama Harinath | Evolvence India Fund (EIF), Pidilite Group, FJ... | $30,000,000 | Series D |
| 4 | Nu Genes | 2004.0 | Telangana | AgriTech | It is a seed company engaged in production, pr... | Narayana Reddy Punyala | Innovation in Food and Agriculture (IFA) | $6,000,000 | NaN |
Data_2018.head()
| Company Name | Industry | Round/Series | Amount | Location | About Company | |
|---|---|---|---|---|---|---|
| 0 | TheCollegeFever | Brand Marketing, Event Promotion, Marketing, S... | Seed | 250000 | Bangalore, Karnataka, India | TheCollegeFever is a hub for fun, fiesta and f... |
| 1 | Happy Cow Dairy | Agriculture, Farming | Seed | ₹40,000,000 | Mumbai, Maharashtra, India | A startup which aggregates milk from dairy far... |
| 2 | MyLoanCare | Credit, Financial Services, Lending, Marketplace | Series A | ₹65,000,000 | Gurgaon, Haryana, India | Leading Online Loans Marketplace in India |
| 3 | PayMe India | Financial Services, FinTech | Angel | 2000000 | Noida, Uttar Pradesh, India | PayMe India is an innovative FinTech organizat... |
| 4 | Eunimart | E-Commerce Platforms, Retail, SaaS | Seed | — | Hyderabad, Andhra Pradesh, India | Eunimart is a one stop solution for merchants ... |
Data_2021["Year of funding"]= 2021
Data_2020["Year of funding"]= 2020
Data_2019["Year of funding"]= 2019
Data_2018["Year of funding"]= 2018
Data_2021.drop_duplicates(inplace = True)
Data_2020.drop_duplicates(inplace = True)
Data_2019.drop_duplicates(inplace = True)
Data_2018.drop_duplicates(inplace = True)
Data_2021.rename(columns = {'Amount($)':'Amount(USD)'}, inplace = True)
Data_2020.rename(columns = {'Amount($)':'Amount(USD)'}, inplace = True)
Data_2019.rename(columns = {'Amount($)':'Amount(USD)'}, inplace = True)
Data_2021['Amount(USD)'] = Data_2020['Amount(USD)'].replace({'\$': '', ',': ''}, regex=True)
Data_2020['Amount(USD)'] = Data_2020['Amount(USD)'].replace({'\$': '', ',': ''}, regex=True)
Data_2019['Amount(USD)'] = Data_2020['Amount(USD)'].replace({'\$': '', ',': ''}, regex=True)
Data_2021.drop(['What it does','Founders','Stage'],axis=1, inplace = True)
Data_2020.drop(['What it does','Founders','Stage'],axis=1, inplace = True)
Data_2019.drop(['What it does','Founders','Stage'],axis=1, inplace = True)
Data_2021.head()
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount(USD) | Year of funding | |
|---|---|---|---|---|---|---|---|
| 0 | Unbox Robotics | 2019.0 | Bangalore | AI startup | BEENEXT, Entrepreneur First | 200000 | 2021 |
| 1 | upGrad | 2015.0 | Mumbai | EdTech | Unilazer Ventures, IIFL Asset Management | 100000 | 2021 |
| 2 | Lead School | 2012.0 | Mumbai | EdTech | GSV Ventures, Westbridge Capital | Undisclosed | 2021 |
| 3 | Bizongo | 2015.0 | Mumbai | B2B E-commerce | CDC Group, IDG Capital | 400000 | 2021 |
| 4 | FypMoney | 2021.0 | Gurugram | FinTech | Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal | 340000 | 2021 |
Data_2021.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1190 entries, 0 to 1208 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 1190 non-null object 1 Founded 1189 non-null float64 2 HeadQuarter 1189 non-null object 3 Sector 1190 non-null object 4 Investor 1129 non-null object 5 Amount(USD) 1030 non-null object 6 Year of funding 1190 non-null int64 dtypes: float64(1), int64(1), object(5) memory usage: 74.4+ KB
Data_2021.isnull().sum()
Company/Brand 0 Founded 1 HeadQuarter 1 Sector 0 Investor 61 Amount(USD) 160 Year of funding 0 dtype: int64
Data_2021['Founded'].replace(np.NAN, value=2021, inplace=True)
Data_2021['HeadQuarter'].replace(np.NAN, value='Gurugram', inplace=True)
Data_2021['Founded']= Data_2021['Founded'].astype('int')
Data_2021['Amount(USD)'] = pd.to_numeric(Data_2021['Amount(USD)'], errors='coerce').fillna(0, downcast='infer')
Data_2021.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1190 entries, 0 to 1208 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 1190 non-null object 1 Founded 1190 non-null int64 2 HeadQuarter 1190 non-null object 3 Sector 1190 non-null object 4 Investor 1129 non-null object 5 Amount(USD) 1190 non-null float64 6 Year of funding 1190 non-null int64 dtypes: float64(1), int64(2), object(4) memory usage: 74.4+ KB
Data_2021['Investor'].replace(np.NAN, value= "unknown", inplace=True)
Data_2021.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1190 entries, 0 to 1208 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 1190 non-null object 1 Founded 1190 non-null int64 2 HeadQuarter 1190 non-null object 3 Sector 1190 non-null object 4 Investor 1190 non-null object 5 Amount(USD) 1190 non-null float64 6 Year of funding 1190 non-null int64 dtypes: float64(1), int64(2), object(4) memory usage: 74.4+ KB
Data_2021.describe()
| Founded | Amount(USD) | Year of funding | |
|---|---|---|---|
| count | 1190.000000 | 1.190000e+03 | 1190.0 |
| mean | 2016.637815 | 7.559443e+07 | 2021.0 |
| std | 4.521968 | 2.032155e+09 | 0.0 |
| min | 1963.000000 | 0.000000e+00 | 2021.0 |
| 25% | 2015.000000 | 0.000000e+00 | 2021.0 |
| 50% | 2018.000000 | 1.000000e+06 | 2021.0 |
| 75% | 2020.000000 | 5.475000e+06 | 2021.0 |
| max | 2021.000000 | 7.000000e+10 | 2021.0 |
Data_2020.head()
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount(USD) | Unnamed: 9 | Year of funding | |
|---|---|---|---|---|---|---|---|---|
| 0 | Aqgromalin | 2019 | Chennai | AgriTech | Angel investors | 200000 | NaN | 2020 |
| 1 | Krayonnz | 2019 | Bangalore | EdTech | GSF Accelerator | 100000 | NaN | 2020 |
| 2 | PadCare Labs | 2018 | Pune | Hygiene management | Venture Center | Undisclosed | NaN | 2020 |
| 3 | NCOME | 2020 | New Delhi | Escrow | Venture Catalysts, PointOne Capital | 400000 | NaN | 2020 |
| 4 | Gramophone | 2016 | Indore | AgriTech | Siana Capital Management, Info Edge | 340000 | NaN | 2020 |
Data_2020.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1052 entries, 0 to 1054 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 1052 non-null object 1 Founded 840 non-null object 2 HeadQuarter 958 non-null object 3 Sector 1039 non-null object 4 Investor 1014 non-null object 5 Amount(USD) 1049 non-null object 6 Unnamed: 9 2 non-null object 7 Year of funding 1052 non-null int64 dtypes: int64(1), object(7) memory usage: 106.3+ KB
Data_2020.drop(["Unnamed: 9"],axis =1,inplace =True )
Data_2020.head()
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount(USD) | Year of funding | |
|---|---|---|---|---|---|---|---|
| 0 | Aqgromalin | 2019 | Chennai | AgriTech | Angel investors | 200000 | 2020 |
| 1 | Krayonnz | 2019 | Bangalore | EdTech | GSF Accelerator | 100000 | 2020 |
| 2 | PadCare Labs | 2018 | Pune | Hygiene management | Venture Center | Undisclosed | 2020 |
| 3 | NCOME | 2020 | New Delhi | Escrow | Venture Catalysts, PointOne Capital | 400000 | 2020 |
| 4 | Gramophone | 2016 | Indore | AgriTech | Siana Capital Management, Info Edge | 340000 | 2020 |
Data_2020.isnull().sum()
Company/Brand 0 Founded 212 HeadQuarter 94 Sector 13 Investor 38 Amount(USD) 3 Year of funding 0 dtype: int64
Data_2020['Investor'].replace(np.NAN, value= "unknown", inplace=True)
Data_2020['Sector'].replace(np.NAN, value= "unknown",inplace=True)
Data_2020['Founded'].fillna(0, inplace=True)
Data_2020['HeadQuarter'].replace(np.NAN, value= "unknown",inplace=True)
Data_2020.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1052 entries, 0 to 1054 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 1052 non-null object 1 Founded 1052 non-null object 2 HeadQuarter 1052 non-null object 3 Sector 1052 non-null object 4 Investor 1052 non-null object 5 Amount(USD) 1049 non-null object 6 Year of funding 1052 non-null int64 dtypes: int64(1), object(6) memory usage: 98.0+ KB
Updated = Data_2020['Amount(USD)'] =='Undisclosed'
Data_2020.loc[Updated, 'Amount(USD)'] = 0
Data_2020.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1052 entries, 0 to 1054 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 1052 non-null object 1 Founded 1052 non-null object 2 HeadQuarter 1052 non-null object 3 Sector 1052 non-null object 4 Investor 1052 non-null object 5 Amount(USD) 1049 non-null object 6 Year of funding 1052 non-null int64 dtypes: int64(1), object(6) memory usage: 98.0+ KB
Data_2020['Amount(USD)'] = Data_2020['Amount(USD)'].fillna(0)
Data_2020.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1052 entries, 0 to 1054 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 1052 non-null object 1 Founded 1052 non-null object 2 HeadQuarter 1052 non-null object 3 Sector 1052 non-null object 4 Investor 1052 non-null object 5 Amount(USD) 1052 non-null object 6 Year of funding 1052 non-null int64 dtypes: int64(1), object(6) memory usage: 98.0+ KB
Data_2020.isnull().sum()
Company/Brand 0 Founded 0 HeadQuarter 0 Sector 0 Investor 0 Amount(USD) 0 Year of funding 0 dtype: int64
Data_2020['Amount(USD)'] = pd.to_numeric(Data_2020['Amount(USD)'], errors='coerce').fillna(0, downcast='infer')
Data_2020['Founded'] = pd.to_numeric(Data_2020['Founded'], errors='coerce').fillna(0, downcast='infer')
Data_2020.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1052 entries, 0 to 1054 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 1052 non-null object 1 Founded 1052 non-null int64 2 HeadQuarter 1052 non-null object 3 Sector 1052 non-null object 4 Investor 1052 non-null object 5 Amount(USD) 1052 non-null float64 6 Year of funding 1052 non-null int64 dtypes: float64(1), int64(2), object(4) memory usage: 98.0+ KB
Updated = Data_2020['Amount(USD)'] =='Undisclosed'
Data_2020.loc[Updated, 'Amount(USD)'] = 0
Data_2020.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1052 entries, 0 to 1054 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 1052 non-null object 1 Founded 1052 non-null int64 2 HeadQuarter 1052 non-null object 3 Sector 1052 non-null object 4 Investor 1052 non-null object 5 Amount(USD) 1052 non-null float64 6 Year of funding 1052 non-null int64 dtypes: float64(1), int64(2), object(4) memory usage: 98.0+ KB
Data_2019.head()
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount(USD) | Year of funding | |
|---|---|---|---|---|---|---|---|
| 0 | Bombay Shaving | NaN | NaN | Ecommerce | Sixth Sense Ventures | 200000 | 2019 |
| 1 | Ruangguru | 2014.0 | Mumbai | Edtech | General Atlantic | 100000 | 2019 |
| 2 | Eduisfun | NaN | Mumbai | Edtech | Deepak Parekh, Amitabh Bachchan, Piyush Pandey | Undisclosed | 2019 |
| 3 | HomeLane | 2014.0 | Chennai | Interior design | Evolvence India Fund (EIF), Pidilite Group, FJ... | 400000 | 2019 |
| 4 | Nu Genes | 2004.0 | Telangana | AgriTech | Innovation in Food and Agriculture (IFA) | 340000 | 2019 |
Data_2019.isnull().sum()
Company/Brand 0 Founded 29 HeadQuarter 19 Sector 5 Investor 0 Amount(USD) 2 Year of funding 0 dtype: int64
Data_2019.isnull().sum()
Company/Brand 0 Founded 29 HeadQuarter 19 Sector 5 Investor 0 Amount(USD) 2 Year of funding 0 dtype: int64
Data_2019["Sector"].mode()
0 Edtech Name: Sector, dtype: object
Data_2019['Sector'].fillna(Data_2019['Sector'].mode,inplace = True)
Data_2019.isnull().sum()
Company/Brand 0 Founded 29 HeadQuarter 19 Sector 0 Investor 0 Amount(USD) 2 Year of funding 0 dtype: int64
###Changing amount datatype to int/float
Data_2019['Amount(USD)'] = pd.to_numeric(Data_2019['Amount(USD)'], errors='coerce').fillna(0, downcast='infer')
Data_2019.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 89 entries, 0 to 88 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 89 non-null object 1 Founded 60 non-null float64 2 HeadQuarter 70 non-null object 3 Sector 89 non-null object 4 Investor 89 non-null object 5 Amount(USD) 89 non-null int64 6 Year of funding 89 non-null int64 dtypes: float64(1), int64(2), object(4) memory usage: 5.6+ KB
Data_2019.describe()
| Founded | Amount(USD) | Year of funding | |
|---|---|---|---|
| count | 60.000000 | 8.900000e+01 | 89.0 |
| mean | 2014.533333 | 2.791803e+07 | 2019.0 |
| std | 2.937003 | 1.095456e+08 | 0.0 |
| min | 2004.000000 | 0.000000e+00 | 2019.0 |
| 25% | 2013.000000 | 0.000000e+00 | 2019.0 |
| 50% | 2015.000000 | 7.500000e+05 | 2019.0 |
| 75% | 2016.250000 | 7.000000e+06 | 2019.0 |
| max | 2019.000000 | 7.000000e+08 | 2019.0 |
null_data2 = Data_2019[Data_2019.isnull().any(axis=1)]
null_data2.head()
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount(USD) | Year of funding | |
|---|---|---|---|---|---|---|---|
| 0 | Bombay Shaving | NaN | NaN | Ecommerce | Sixth Sense Ventures | 200000 | 2019 |
| 2 | Eduisfun | NaN | Mumbai | Edtech | Deepak Parekh, Amitabh Bachchan, Piyush Pandey | 0 | 2019 |
| 5 | FlytBase | NaN | Pune | Technology | Undisclosed | 600000 | 2019 |
| 6 | Finly | NaN | Bangalore | SaaS | Social Capital, AngelList India, Gemba Capital... | 600000 | 2019 |
| 8 | Quantiphi | NaN | NaN | AI & Tech | Multiples Alternate Asset Management | 45000000 | 2019 |
Data_2019['Founded'].fillna(0, inplace=True) ##replacing remaining null values with 'unknown'
Data_2019['HeadQuarter'].replace(np.NAN, value= "unknown",inplace=True)
Data_2019.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 89 entries, 0 to 88 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 89 non-null object 1 Founded 89 non-null float64 2 HeadQuarter 89 non-null object 3 Sector 89 non-null object 4 Investor 89 non-null object 5 Amount(USD) 89 non-null int64 6 Year of funding 89 non-null int64 dtypes: float64(1), int64(2), object(4) memory usage: 5.6+ KB
Updated = Data_2020['Amount(USD)'] =='Undisclosed'
Data_2019.loc[Updated, 'Amount(USD)'] = 0
Data_2019.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 89 entries, 0 to 88 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 89 non-null object 1 Founded 89 non-null float64 2 HeadQuarter 89 non-null object 3 Sector 89 non-null object 4 Investor 89 non-null object 5 Amount(USD) 89 non-null int64 6 Year of funding 89 non-null int64 dtypes: float64(1), int64(2), object(4) memory usage: 5.6+ KB
Columns = [Data_2019, Data_2020, Data_2021]
Merged_Data = pd.concat(Columns)
Merged_Data
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount(USD) | Year of funding | |
|---|---|---|---|---|---|---|---|
| 0 | Bombay Shaving | 0.0 | unknown | Ecommerce | Sixth Sense Ventures | 200000.0 | 2019 |
| 1 | Ruangguru | 2014.0 | Mumbai | Edtech | General Atlantic | 100000.0 | 2019 |
| 2 | Eduisfun | 0.0 | Mumbai | Edtech | Deepak Parekh, Amitabh Bachchan, Piyush Pandey | 0.0 | 2019 |
| 3 | HomeLane | 2014.0 | Chennai | Interior design | Evolvence India Fund (EIF), Pidilite Group, FJ... | 400000.0 | 2019 |
| 4 | Nu Genes | 2004.0 | Telangana | AgriTech | Innovation in Food and Agriculture (IFA) | 340000.0 | 2019 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1204 | Gigforce | 2019.0 | Gurugram | Staffing & Recruiting | Endiya Partners | 0.0 | 2021 |
| 1205 | Vahdam | 2015.0 | New Delhi | Food & Beverages | IIFL AMC | 0.0 | 2021 |
| 1206 | Leap Finance | 2019.0 | Bangalore | Financial Services | Owl Ventures | 0.0 | 2021 |
| 1207 | CollegeDekho | 2015.0 | Gurugram | EdTech | Winter Capital, ETS, Man Capital | 0.0 | 2021 |
| 1208 | WeRize | 2019.0 | Bangalore | Financial Services | 3one4 Capital, Kalaari Capital | 0.0 | 2021 |
2331 rows × 7 columns
Merged_Data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2331 entries, 0 to 1208 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 2331 non-null object 1 Founded 2331 non-null float64 2 HeadQuarter 2331 non-null object 3 Sector 2331 non-null object 4 Investor 2331 non-null object 5 Amount(USD) 2331 non-null float64 6 Year of funding 2331 non-null int64 dtypes: float64(2), int64(1), object(4) memory usage: 145.7+ KB
Merged_Data['Founded']= Merged_Data['Founded'].astype('int')
Merged_Data['Year of funding']= Merged_Data['Year of funding'].astype('int')
Merged_Data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2331 entries, 0 to 1208 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 2331 non-null object 1 Founded 2331 non-null int64 2 HeadQuarter 2331 non-null object 3 Sector 2331 non-null object 4 Investor 2331 non-null object 5 Amount(USD) 2331 non-null float64 6 Year of funding 2331 non-null int64 dtypes: float64(1), int64(2), object(4) memory usage: 145.7+ KB
STRIPPING THE FIRST VARIABLES OF THE INVESTOR COLUMN
Merged_Data['Investor'] = Merged_Data['Investor'].apply(str) # To apply string formatting to the whole column
Merged_Data['Investor'] = Merged_Data['Investor'].str.split(',').str[0] # To separate the values in the column by commas and select the first value only
Merged_Data['Investor'] = Merged_Data['Investor'].replace("'", "", regex=True) # Remove any ' that may be attached to the data
Merged_Data.replace({'Sector':{'EdTech':'Edtech','FinTech':'Fintech', 'HealthCare': 'Healthcare',
'SaaS startup':'SaaS','HealthTech': 'Healthtech',
'Ecommerce': 'E-commerce','Food':'Foodtech', 'AI startup':'AI',
'AgriTech':'Agritech','Logistics & Supply Chain':'Logistics','Tech':'Tech Startup',
'Tech':'Tech company','IT':'Information Technology','Computer software':'Computer Software',
'Technology':'Tech company','Hospital & Health Care':'Healthcare','Automobile':'Automotive',
'Healthcare':'Healthcare','Social Media':'Media','Fashion':'Apparel & Fashion'}}, inplace = True)
# Replacing duplicate sector names in Sector column with a single name
Merged_Data.replace({'Sector':{'EdTech':'Edtech','FinTech':'Fintech','HealthCare':'Healthcare',
'SaaS startup':'SaaS','HealthTech': 'Healthtech', 'Ecommerce': 'E-commerce',
'Food':'Foodtech','AI startup':'AI','AgriTech':'Agritech','Logistics & Supply Chain':'Logistics',
'IT':'Information Technology','Automobile':'Automotive','Tech':'Tech Startup'}}, inplace = True)
Merged_Data
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount(USD) | Year of funding | |
|---|---|---|---|---|---|---|---|
| 0 | Bombay Shaving | 0 | unknown | E-commerce | Sixth Sense Ventures | 200000.0 | 2019 |
| 1 | Ruangguru | 2014 | Mumbai | Edtech | General Atlantic | 100000.0 | 2019 |
| 2 | Eduisfun | 0 | Mumbai | Edtech | Deepak Parekh | 0.0 | 2019 |
| 3 | HomeLane | 2014 | Chennai | Interior design | Evolvence India Fund (EIF) | 400000.0 | 2019 |
| 4 | Nu Genes | 2004 | Telangana | Agritech | Innovation in Food and Agriculture (IFA) | 340000.0 | 2019 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1204 | Gigforce | 2019 | Gurugram | Staffing & Recruiting | Endiya Partners | 0.0 | 2021 |
| 1205 | Vahdam | 2015 | New Delhi | Food & Beverages | IIFL AMC | 0.0 | 2021 |
| 1206 | Leap Finance | 2019 | Bangalore | Financial Services | Owl Ventures | 0.0 | 2021 |
| 1207 | CollegeDekho | 2015 | Gurugram | Edtech | Winter Capital | 0.0 | 2021 |
| 1208 | WeRize | 2019 | Bangalore | Financial Services | 3one4 Capital | 0.0 | 2021 |
2331 rows × 7 columns
Merged_Data.replace({'HeadQuarter':{'New Delhi':'Delhi'}}, inplace = True)
group_by_sector = Merged_Data["Sector"].value_counts()
group_by_sector.head(40)
Fintech 257 Edtech 215 E-commerce 96 Healthcare 79 Agritech 62 Healthtech 60 Financial Services 60 SaaS 58 Logistics 51 Automotive 46 AI 42 Food & Beverages 38 Tech company 36 Gaming 35 Media 34 Information Technology & Services 34 Computer Software 31 Foodtech 29 Tech Startup 25 Retail 24 Consumer Goods 24 E-learning 24 Apparel & Fashion 20 Information Technology 18 Hospitality 18 Health, Wellness & Fitness 17 Entertainment 16 Real Estate 13 unknown 13 Cosmetics 12 IoT 11 Transportation 9 Finance 9 Insurance 8 Deeptech 8 Insuretech 7 Mobility 7 Health 7 IT startup 6 Food Industry 6 Name: Sector, dtype: int64
group_by_Yearoffunding = Merged_Data["Year of funding"].value_counts()
group_by_Yearoffunding
2021 1190 2020 1052 2019 89 Name: Year of funding, dtype: int64
Merged_Data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 2331 entries, 0 to 1208 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company/Brand 2331 non-null object 1 Founded 2331 non-null int64 2 HeadQuarter 2331 non-null object 3 Sector 2331 non-null object 4 Investor 2331 non-null object 5 Amount(USD) 2331 non-null float64 6 Year of funding 2331 non-null int64 dtypes: float64(1), int64(2), object(4) memory usage: 145.7+ KB
pd.set_option('display.float_format', lambda x: '%.2f' % x)
Merged_Data.rename(columns = {'Amount(USD)':'Amount'}, inplace = True)
Amount_by_sectors = Merged_Data.groupby(by = "Sector").Amount.agg(["sum","count"]).sort_values(by = ["sum"],
ascending = False)
Amount_by_sectors
| sum | count | |
|---|---|---|
| Sector | ||
| Retail | 70477743000.00 | 24 |
| Information Technology & Services | 70210150000.00 | 34 |
| Edtech | 5712520230.00 | 215 |
| Fintech | 4598733709.60 | 257 |
| Tech company | 3409583900.00 | 36 |
| ... | ... | ... |
| Manchester, Greater Manchester | 0.00 | 1 |
| Digital mortgage | 0.00 | 1 |
| MarTech | 0.00 | 1 |
| SaaS startup | 0.00 | 1 |
| Food Startup | 0.00 | 1 |
486 rows × 2 columns
plt.figure(figsize = (20,15))
plt.xticks(fontsize = 20)
plt.yticks(fontsize = 20)
sns.barplot(y = Amount_by_sectors[:10].index, x = (Amount_by_sectors["sum"])[:10])
plt.ylabel("SECTORS",fontsize = 40,fontweight = 'bold')
plt.xlabel("Total Funding Received",fontsize = 25,fontweight = 'bold')
plt.title("FUNDING RECEIVED BY SECTORS",fontsize = 40,fontweight = 'bold')
Text(0.5, 1.0, 'FUNDING RECEIVED BY SECTORS')
FOR QUESTION 2: How has funding of startups changed overtime?
SORTING 0 FROM FOUNDED COLUMN
Sample1 = Merged_Data[Merged_Data['Amount'] != 0].sort_values('Amount', ascending=False)
Sample1
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount | Year of funding | |
|---|---|---|---|---|---|---|---|
| 280 | Keka HR | 2014 | Hyderabad | Information Technology & Services | Recur Club | 70000000000.00 | 2021 |
| 280 | Reliance Retail Ventures Ltd | 2006 | Mumbai | Retail | Silver Lake | 70000000000.00 | 2020 |
| 317 | Infra.Market | 2016 | Thane | Construction | InnoVen Capital | 3000000000.00 | 2021 |
| 317 | Snowflake | 2012 | California | Tech company | Salesforce Ventures | 3000000000.00 | 2020 |
| 328 | WizKlub | 2018 | Bangalore | Edtech | Incubate Fund India | 2200000000.00 | 2021 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1000 | VLCC Health Care | 1989 | Gurugram | Health, Wellness & Fitness | unknown | 12700.00 | 2021 |
| 834 | Get My Parking | 2015 | Delhi | Mobility | IvyCap Ventures | 42.23 | 2021 |
| 834 | Peel Works | 2010 | Mumbai | SaaS | CESC Ventures | 42.23 | 2020 |
| 552 | SATYA Microcapital | 1995 | Delhi | Fintech | BlueOrchard Finance Limited | 9.60 | 2020 |
| 552 | Indi Energy | 2019 | Roorkee | Renewables & Environment | Mumbai Angels Network | 9.60 | 2021 |
1657 rows × 7 columns
c = list(Sample1.groupby(Sample1['Year of funding']).sum()['Amount'])
d = list(Sample1['Year of funding'].value_counts().index.sort_values())
sns.scatterplot(d,c)
plt.plot(d,c)
plt.xlabel('YEARS FUNDING WERE RECEIVED', fontsize = 22, fontweight = 'bold')
plt.ylabel('Amount Received in (USD) ', fontsize = 22, fontweight = 'bold')
plt.xticks(fontsize = 15)
plt.yticks(fontsize = 15)
plt.title('FUNDING RECEIVED BY STARTUPS FROM 2019 to 2021', fontsize = 22, fontweight = 'bold')
plt.rcParams['figure.figsize'] = (20,10)
warnings.filterwarnings('ignore')
To get this bar chart a new dataframe was created called the Sample2 by way of getting rid of unkown values
Sample2 = Merged_Data[Merged_Data['HeadQuarter'] != 'unknown']
Merged_Data.head()
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount | Year of funding | |
|---|---|---|---|---|---|---|---|
| 0 | Bombay Shaving | 0 | unknown | E-commerce | Sixth Sense Ventures | 200000.00 | 2019 |
| 1 | Ruangguru | 2014 | Mumbai | Edtech | General Atlantic | 100000.00 | 2019 |
| 2 | Eduisfun | 0 | Mumbai | Edtech | Deepak Parekh | 0.00 | 2019 |
| 3 | HomeLane | 2014 | Chennai | Interior design | Evolvence India Fund (EIF) | 400000.00 | 2019 |
| 4 | Nu Genes | 2004 | Telangana | Agritech | Innovation in Food and Agriculture (IFA) | 340000.00 | 2019 |
group_by_location = Sample2["HeadQuarter"].value_counts()
group_by_location
Bangalore 758
Mumbai 374
Delhi 251
Gurugram 239
Chennai 87
...
Ludhiana 1
Rajastan 1
Jiaxing, Zhejiang, China 1
Shanghai, China 1
Gandhinagar 1
Name: HeadQuarter, Length: 122, dtype: int64
Amount_by_location = Sample2 .groupby(by = "HeadQuarter").Amount.agg(["sum"]).sort_values(by = ["sum"],
ascending = False)
Amount_by_location
| sum | |
|---|---|
| HeadQuarter | |
| Mumbai | 77928437618.23 |
| Hyderabad | 70264697000.00 |
| Bangalore | 12930857158.00 |
| Gurugram | 3454999700.00 |
| Delhi | 3273139879.83 |
| ... | ... |
| Bengaluru | 0.00 |
| Palmwoods, Queensland, Australia | 0.00 |
| Bhopal | 0.00 |
| San Franciscao | 0.00 |
| Guwahati | 0.00 |
122 rows × 1 columns
Getting Top 10 of the Amount by Location Dataframe
Amount_by_location_top = Amount_by_location.iloc[:10]
Amount_by_location_top
| sum | |
|---|---|
| HeadQuarter | |
| Mumbai | 77928437618.23 |
| Hyderabad | 70264697000.00 |
| Bangalore | 12930857158.00 |
| Gurugram | 3454999700.00 |
| Delhi | 3273139879.83 |
| Thane | 3081415000.00 |
| California | 3078300000.00 |
| Pune | 1165935000.00 |
| Noida | 1035734300.00 |
| Haryana | 851549800.00 |
Amount_by_location_top .plot(kind='bar', title='FUNDING ASSOCIATED WITH THEIR LOCATIONS', ylabel='Amount Received',
xlabel='HEADQUATERS', figsize=(25,6))
<AxesSubplot:title={'center':'FUNDING ASSOCIATED WITH THEIR LOCATIONS'}, xlabel='HEADQUATERS', ylabel='Amount Received'>
x = Merged_Data['Amount']
y = group_by_location
plt.bar(x,y)
plt.rcParams['figure.figsize'] = (20,10)
plt.title('FUNDING ASSOCIATED WITH THEIR LOCATIONS', fontsize = 19)
plt.xlabel('AMOUNT RECEIVED', fontsize = 19,)
plt.ylabel('HEADQUARTERS', fontsize = 19)
fig = plt.figure(figsize = (14, 8))
plt.scatter(Sample1['Founded'], Merged_Data["Year of funding"],
s=Sample1["Amount"]* 0.001, alpha=0.5)
plt.show()
ax = Merged_Data.plot(kind ='scatter',x = 'Founded',
y = 'group_by_Year of funding', figsize =(10,5),alpha =0.5,
color = 'yellow', s = Merged_Data["Amount" ]* 1000 + 10)
ax.set_ylabel("")
ax.set_title("HOW FUNDING HAS CHANGED OVERTIME")
ax.legend(['Amount'],loc = 'upper right', fontsize = 'x-large')
Sample3 = Merged_Data[Merged_Data['Investor'] != 'unknown']
Sample3
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount | Year of funding | |
|---|---|---|---|---|---|---|---|
| 0 | Bombay Shaving | 0 | unknown | E-commerce | Sixth Sense Ventures | 200000.00 | 2019 |
| 1 | Ruangguru | 2014 | Mumbai | Edtech | General Atlantic | 100000.00 | 2019 |
| 2 | Eduisfun | 0 | Mumbai | Edtech | Deepak Parekh | 0.00 | 2019 |
| 3 | HomeLane | 2014 | Chennai | Interior design | Evolvence India Fund (EIF) | 400000.00 | 2019 |
| 4 | Nu Genes | 2004 | Telangana | Agritech | Innovation in Food and Agriculture (IFA) | 340000.00 | 2019 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1204 | Gigforce | 2019 | Gurugram | Staffing & Recruiting | Endiya Partners | 0.00 | 2021 |
| 1205 | Vahdam | 2015 | Delhi | Food & Beverages | IIFL AMC | 0.00 | 2021 |
| 1206 | Leap Finance | 2019 | Bangalore | Financial Services | Owl Ventures | 0.00 | 2021 |
| 1207 | CollegeDekho | 2015 | Gurugram | Edtech | Winter Capital | 0.00 | 2021 |
| 1208 | WeRize | 2019 | Bangalore | Financial Services | 3one4 Capital | 0.00 | 2021 |
2232 rows × 7 columns
Sample4 = Sample3 [Sample3 ['Amount'] != 0]
Sample4
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount | Year of funding | |
|---|---|---|---|---|---|---|---|
| 0 | Bombay Shaving | 0 | unknown | E-commerce | Sixth Sense Ventures | 200000.00 | 2019 |
| 1 | Ruangguru | 2014 | Mumbai | Edtech | General Atlantic | 100000.00 | 2019 |
| 3 | HomeLane | 2014 | Chennai | Interior design | Evolvence India Fund (EIF) | 400000.00 | 2019 |
| 4 | Nu Genes | 2004 | Telangana | Agritech | Innovation in Food and Agriculture (IFA) | 340000.00 | 2019 |
| 5 | FlytBase | 0 | Pune | Tech company | Undisclosed | 600000.00 | 2019 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1050 | Zenduty | 2019 | Bangalore | Computer Software | StartupXseed Ventures | 1500000.00 | 2021 |
| 1051 | R for Rabbit | 2014 | Ahmedabad | Consumer Goods | Xponentia Capital Partners | 13200000.00 | 2021 |
| 1052 | Acko | 2016 | Bangalore | Insurance | General Atlantic | 8000000.00 | 2021 |
| 1053 | LoveLocal | 2015 | Mumbai | Retail | Vulcan Capital | 8043000.00 | 2021 |
| 1054 | SupplyNote | 2015 | Noida | Food & Beverages | Venture Catalysts | 9000000.00 | 2021 |
1589 rows × 7 columns
Sample5 = Sample4 [Sample4 ['Founded'] != 0]
Sample5
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount | Year of funding | |
|---|---|---|---|---|---|---|---|
| 1 | Ruangguru | 2014 | Mumbai | Edtech | General Atlantic | 100000.00 | 2019 |
| 3 | HomeLane | 2014 | Chennai | Interior design | Evolvence India Fund (EIF) | 400000.00 | 2019 |
| 4 | Nu Genes | 2004 | Telangana | Agritech | Innovation in Food and Agriculture (IFA) | 340000.00 | 2019 |
| 9 | Lenskart | 2010 | Delhi | E-commerce | SoftBank | 1000000.00 | 2019 |
| 10 | Cub McPaws | 2010 | Mumbai | E-commerce & AR | Venture Catalysts | 2000000.00 | 2019 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1050 | Zenduty | 2019 | Bangalore | Computer Software | StartupXseed Ventures | 1500000.00 | 2021 |
| 1051 | R for Rabbit | 2014 | Ahmedabad | Consumer Goods | Xponentia Capital Partners | 13200000.00 | 2021 |
| 1052 | Acko | 2016 | Bangalore | Insurance | General Atlantic | 8000000.00 | 2021 |
| 1053 | LoveLocal | 2015 | Mumbai | Retail | Vulcan Capital | 8043000.00 | 2021 |
| 1054 | SupplyNote | 2015 | Noida | Food & Beverages | Venture Catalysts | 9000000.00 | 2021 |
1424 rows × 7 columns
Sample6 = Sample5 [Sample5 ['HeadQuarter'] != 'unknown']
Sample6
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount | Year of funding | |
|---|---|---|---|---|---|---|---|
| 1 | Ruangguru | 2014 | Mumbai | Edtech | General Atlantic | 100000.00 | 2019 |
| 3 | HomeLane | 2014 | Chennai | Interior design | Evolvence India Fund (EIF) | 400000.00 | 2019 |
| 4 | Nu Genes | 2004 | Telangana | Agritech | Innovation in Food and Agriculture (IFA) | 340000.00 | 2019 |
| 9 | Lenskart | 2010 | Delhi | E-commerce | SoftBank | 1000000.00 | 2019 |
| 10 | Cub McPaws | 2010 | Mumbai | E-commerce & AR | Venture Catalysts | 2000000.00 | 2019 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1050 | Zenduty | 2019 | Bangalore | Computer Software | StartupXseed Ventures | 1500000.00 | 2021 |
| 1051 | R for Rabbit | 2014 | Ahmedabad | Consumer Goods | Xponentia Capital Partners | 13200000.00 | 2021 |
| 1052 | Acko | 2016 | Bangalore | Insurance | General Atlantic | 8000000.00 | 2021 |
| 1053 | LoveLocal | 2015 | Mumbai | Retail | Vulcan Capital | 8043000.00 | 2021 |
| 1054 | SupplyNote | 2015 | Noida | Food & Beverages | Venture Catalysts | 9000000.00 | 2021 |
1367 rows × 7 columns
fig1 = go.Figure(
data=go.Pie(values=Sample6 ['Investor'].value_counts()[:10].values,labels=
Sample6 ['Investor'].value_counts()[:8].index,title='TOP INVESTORS IN THE INDIAN ECOSYTEM'))
fig1.show()
Question5:Do Indian Startups receive funds from foreign investors and which sectors receive most of these funds?
Amount_by_Investors_top
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount | Year of funding | |
|---|---|---|---|---|---|---|---|
| 1 | Ruangguru | 2014 | Mumbai | Edtech | General Atlantic | 100000.00 | 2019 |
| 3 | HomeLane | 2014 | Chennai | Interior design | Evolvence India Fund (EIF) | 400000.00 | 2019 |
| 4 | Nu Genes | 2004 | Telangana | Agritech | Innovation in Food and Agriculture (IFA) | 340000.00 | 2019 |
| 9 | Lenskart | 2010 | Delhi | E-commerce | SoftBank | 1000000.00 | 2019 |
| 10 | Cub McPaws | 2010 | Mumbai | E-commerce & AR | Venture Catalysts | 2000000.00 | 2019 |
| 13 | JobSquare | 2019 | Ahmedabad | HR tech | Titan Capital | 1200000.00 | 2019 |
| 15 | LivFin | 2017 | Delhi | Fintech | German development finance institution DEG | 660000000.00 | 2019 |
| 17 | Zest Money | 2015 | Bangalore | Fintech | Goldman Sachs. | 7500000.00 | 2019 |
| 19 | Azah Personal Care Pvt. Ltd. | 2018 | Gurugram | Health | Kunal Bahl | 1000000.00 | 2019 |
| 23 | DROR Labs Pvt. Ltd | 2018 | Delhi | Safety tech | Inflection Point Ventures | 500000.00 | 2019 |
# create new column using ditionary mapping
Amount_by_Investors_top["Investortypes"] = Amount_by_Investors_top['Investor'].map(
{'General Atlantic': 'Foreign Investor',' Evolvence India Fund (EIF), Pidilite Group,FJ': 'Domestic Investor',
'Venture Catalysts':'Domestic Investor','Innovation in Food and Agriculture (IFA)':'Domestic Investor',
'SoftBank':'Foreign Investor','Titan Capital':'Domestic Investor','German development finance institution DEG':
'Foreign Investor','Goldman Sachs.':'Foreign Investor','Kunal Bahl':'Domestic Investor',
'Inflection Point Ventures':'Domestic Investor','Evolvence India Fund (EIF)':'Domestic Investor'})
# display the dataframe
Amount_by_Investors_top
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount | Year of funding | Investortypes | |
|---|---|---|---|---|---|---|---|---|
| 1 | Ruangguru | 2014 | Mumbai | Edtech | General Atlantic | 100000.00 | 2019 | Foreign Investor |
| 3 | HomeLane | 2014 | Chennai | Interior design | Evolvence India Fund (EIF) | 400000.00 | 2019 | Domestic Investor |
| 4 | Nu Genes | 2004 | Telangana | Agritech | Innovation in Food and Agriculture (IFA) | 340000.00 | 2019 | Domestic Investor |
| 9 | Lenskart | 2010 | Delhi | E-commerce | SoftBank | 1000000.00 | 2019 | Foreign Investor |
| 10 | Cub McPaws | 2010 | Mumbai | E-commerce & AR | Venture Catalysts | 2000000.00 | 2019 | Domestic Investor |
| 13 | JobSquare | 2019 | Ahmedabad | HR tech | Titan Capital | 1200000.00 | 2019 | Domestic Investor |
| 15 | LivFin | 2017 | Delhi | Fintech | German development finance institution DEG | 660000000.00 | 2019 | Foreign Investor |
| 17 | Zest Money | 2015 | Bangalore | Fintech | Goldman Sachs. | 7500000.00 | 2019 | Foreign Investor |
| 19 | Azah Personal Care Pvt. Ltd. | 2018 | Gurugram | Health | Kunal Bahl | 1000000.00 | 2019 | Domestic Investor |
| 23 | DROR Labs Pvt. Ltd | 2018 | Delhi | Safety tech | Inflection Point Ventures | 500000.00 | 2019 | Domestic Investor |
Amount_by_Investors = Amount_by_Investors_top.groupby(by = 'Investortypes').Amount.agg(
["sum"]).sort_values(by = ["sum"], ascending = False)
Amount_by_Investors
| sum | |
|---|---|
| Investortypes | |
| Foreign Investor | 668600000.00 |
| Domestic Investor | 5440000.00 |
g1 = go.Figure(
data=go.Pie(values=Amount_by_Investors_top ['Investortypes'].value_counts().values,
labels=Amount_by_Investors_top ['Investortypes'].value_counts().index,
title='FOREIGN TO DOMESTIC INVESTMENT IN INDIA'))
g1.show()
investors is selected and put in a new dataframe to be able do our analysis
Amount_by_Investors_top .plot(kind='bar', title='FUNDING ASSOCIATED WITH THEIR LOCATIONS', ylabel='Amount',
xlabel='Sector', figsize=(25, 10))
<AxesSubplot:title={'center':'FUNDING ASSOCIATED WITH THEIR LOCATIONS'}, xlabel='Sector', ylabel='Amount'>
Amount_by_Sector = Amount_by_Investors_top.groupby(by = 'Sector').Amount.agg(
["sum"]).sort_values(by = ["sum"], ascending = False)
Amount_by_Sector
| sum | |
|---|---|
| Sector | |
| Fintech | 667500000.00 |
| E-commerce & AR | 2000000.00 |
| HR tech | 1200000.00 |
| E-commerce | 1000000.00 |
| Health | 1000000.00 |
| Safety tech | 500000.00 |
| Interior design | 400000.00 |
| Agritech | 340000.00 |
| Edtech | 100000.00 |
sns.barplot(x = Amount_by_Sector.index, y = Amount_by_Sector["sum"])
plt.title("SECTORS MOST FUNDED BY FOREIGN INVESTORS")
plt.ylabel(" ")
plt.xlabel("Total Amount of Funding per Year")
Text(0.5, 0, 'Total Amount of Funding per Year')
BAR PLOT REPRESENTATION OF DATA
FUNDING WITHOUT FINTECH WHICH IS CONSISDERED AS EXTREME OUTLIER
Amount_by_Investors_top.drop([15,17], axis=0, inplace=True)
Amount_by_Investors_top
| Company/Brand | Founded | HeadQuarter | Sector | Investor | Amount | Year of funding | Investortypes | |
|---|---|---|---|---|---|---|---|---|
| 1 | Ruangguru | 2014 | Mumbai | Edtech | General Atlantic | 100000.00 | 2019 | Foreign Investor |
| 3 | HomeLane | 2014 | Chennai | Interior design | Evolvence India Fund (EIF) | 400000.00 | 2019 | Domestic Investor |
| 4 | Nu Genes | 2004 | Telangana | Agritech | Innovation in Food and Agriculture (IFA) | 340000.00 | 2019 | Domestic Investor |
| 9 | Lenskart | 2010 | Delhi | E-commerce | SoftBank | 1000000.00 | 2019 | Foreign Investor |
| 10 | Cub McPaws | 2010 | Mumbai | E-commerce & AR | Venture Catalysts | 2000000.00 | 2019 | Domestic Investor |
| 13 | JobSquare | 2019 | Ahmedabad | HR tech | Titan Capital | 1200000.00 | 2019 | Domestic Investor |
| 19 | Azah Personal Care Pvt. Ltd. | 2018 | Gurugram | Health | Kunal Bahl | 1000000.00 | 2019 | Domestic Investor |
| 23 | DROR Labs Pvt. Ltd | 2018 | Delhi | Safety tech | Inflection Point Ventures | 500000.00 | 2019 | Domestic Investor |
Amount_by_Investors_top .plot(kind='bar', title='SECTORS MOST FUNDED BY FOREIGN INVESTORS',
ylabel='Amount',
xlabel='Sector', figsize=(25, 10))
<AxesSubplot:title={'center':'SECTORS MOST FUNDED BY FOREIGN INVESTORS'}, xlabel='Sector', ylabel='Amount'>
Fintech is most favored by Favored by Foreign Investors
Data_2018.head()
| Company Name | Industry | Round/Series | Amount | Location | About Company | Year of funding | |
|---|---|---|---|---|---|---|---|
| 0 | TheCollegeFever | Brand Marketing, Event Promotion, Marketing, S... | Seed | 250000 | Bangalore, Karnataka, India | TheCollegeFever is a hub for fun, fiesta and f... | 2018 |
| 1 | Happy Cow Dairy | Agriculture, Farming | Seed | ₹40,000,000 | Mumbai, Maharashtra, India | A startup which aggregates milk from dairy far... | 2018 |
| 2 | MyLoanCare | Credit, Financial Services, Lending, Marketplace | Series A | ₹65,000,000 | Gurgaon, Haryana, India | Leading Online Loans Marketplace in India | 2018 |
| 3 | PayMe India | Financial Services, FinTech | Angel | 2000000 | Noida, Uttar Pradesh, India | PayMe India is an innovative FinTech organizat... | 2018 |
| 4 | Eunimart | E-Commerce Platforms, Retail, SaaS | Seed | — | Hyderabad, Andhra Pradesh, India | Eunimart is a one stop solution for merchants ... | 2018 |
Data_2018.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 525 entries, 0 to 525 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company Name 525 non-null object 1 Industry 525 non-null object 2 Round/Series 525 non-null object 3 Amount 525 non-null object 4 Location 525 non-null object 5 About Company 525 non-null object 6 Year of funding 525 non-null int64 dtypes: int64(1), object(6) memory usage: 32.8+ KB
Data_2018.rename(columns = {'Location':'Headquarter','Industry': 'Sector'},inplace = True)
Data_2018
| Company Name | Sector | Round/Series | Amount | Headquarter | About Company | Year of funding | |
|---|---|---|---|---|---|---|---|
| 0 | TheCollegeFever | Brand Marketing, Event Promotion, Marketing, S... | Seed | 250000 | Bangalore, Karnataka, India | TheCollegeFever is a hub for fun, fiesta and f... | 2018 |
| 1 | Happy Cow Dairy | Agriculture, Farming | Seed | ₹40,000,000 | Mumbai, Maharashtra, India | A startup which aggregates milk from dairy far... | 2018 |
| 2 | MyLoanCare | Credit, Financial Services, Lending, Marketplace | Series A | ₹65,000,000 | Gurgaon, Haryana, India | Leading Online Loans Marketplace in India | 2018 |
| 3 | PayMe India | Financial Services, FinTech | Angel | 2000000 | Noida, Uttar Pradesh, India | PayMe India is an innovative FinTech organizat... | 2018 |
| 4 | Eunimart | E-Commerce Platforms, Retail, SaaS | Seed | — | Hyderabad, Andhra Pradesh, India | Eunimart is a one stop solution for merchants ... | 2018 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 521 | Udaan | B2B, Business Development, Internet, Marketplace | Series C | 225000000 | Bangalore, Karnataka, India | Udaan is a B2B trade platform, designed specif... | 2018 |
| 522 | Happyeasygo Group | Tourism, Travel | Series A | — | Haryana, Haryana, India | HappyEasyGo is an online travel domain. | 2018 |
| 523 | Mombay | Food and Beverage, Food Delivery, Internet | Seed | 7500 | Mumbai, Maharashtra, India | Mombay is a unique opportunity for housewives ... | 2018 |
| 524 | Droni Tech | Information Technology | Seed | ₹35,000,000 | Mumbai, Maharashtra, India | Droni Tech manufacture UAVs and develop softwa... | 2018 |
| 525 | Netmeds | Biotechnology, Health Care, Pharmaceutical | Series C | 35000000 | Chennai, Tamil Nadu, India | Welcome to India's most convenient pharmacy! | 2018 |
525 rows × 7 columns
Data_2018.drop(['About Company','Round/Series'],axis=1, inplace = True)
Data_2018.head()
| Company Name | Sector | Amount | Headquarter | Year of funding | |
|---|---|---|---|---|---|
| 0 | TheCollegeFever | Brand Marketing, Event Promotion, Marketing, S... | 250000 | Bangalore, Karnataka, India | 2018 |
| 1 | Happy Cow Dairy | Agriculture, Farming | ₹40,000,000 | Mumbai, Maharashtra, India | 2018 |
| 2 | MyLoanCare | Credit, Financial Services, Lending, Marketplace | ₹65,000,000 | Gurgaon, Haryana, India | 2018 |
| 3 | PayMe India | Financial Services, FinTech | 2000000 | Noida, Uttar Pradesh, India | 2018 |
| 4 | Eunimart | E-Commerce Platforms, Retail, SaaS | — | Hyderabad, Andhra Pradesh, India | 2018 |
Data_2018.isnull().sum()
Company Name 0 Sector 0 Amount 0 Headquarter 0 Year of funding 0 dtype: int64
Data_2018['Sector'] = Data_2018['Sector'].str.replace('-', '', n=1)
Data_2018
| Company Name | Sector | Amount | Headquarter | Year of funding | |
|---|---|---|---|---|---|
| 0 | TheCollegeFever | Brand Marketing, Event Promotion, Marketing, S... | 250000 | Bangalore, Karnataka, India | 2018 |
| 1 | Happy Cow Dairy | Agriculture, Farming | ₹40,000,000 | Mumbai, Maharashtra, India | 2018 |
| 2 | MyLoanCare | Credit, Financial Services, Lending, Marketplace | ₹65,000,000 | Gurgaon, Haryana, India | 2018 |
| 3 | PayMe India | Financial Services, FinTech | 2000000 | Noida, Uttar Pradesh, India | 2018 |
| 4 | Eunimart | ECommerce Platforms, Retail, SaaS | — | Hyderabad, Andhra Pradesh, India | 2018 |
| ... | ... | ... | ... | ... | ... |
| 521 | Udaan | B2B, Business Development, Internet, Marketplace | 225000000 | Bangalore, Karnataka, India | 2018 |
| 522 | Happyeasygo Group | Tourism, Travel | — | Haryana, Haryana, India | 2018 |
| 523 | Mombay | Food and Beverage, Food Delivery, Internet | 7500 | Mumbai, Maharashtra, India | 2018 |
| 524 | Droni Tech | Information Technology | ₹35,000,000 | Mumbai, Maharashtra, India | 2018 |
| 525 | Netmeds | Biotechnology, Health Care, Pharmaceutical | 35000000 | Chennai, Tamil Nadu, India | 2018 |
525 rows × 5 columns
Data_2018['Sector'] =Data_2018['Sector'].apply(str) # To apply string formatting to the whole column
Data_2018['Sector'] =Data_2018['Sector'].str.split(',').str[0] # To separate the values in the column by commas and select the first value only
Data_2018['Sector'] = Data_2018['Sector'].replace("'", "", regex=True) # Remove any ' that may be attached to the data
Data_2018.head()
| Company Name | Sector | Amount | Headquarter | Year of funding | |
|---|---|---|---|---|---|
| 0 | TheCollegeFever | Brand Marketing | 250000 | Bangalore, Karnataka, India | 2018 |
| 1 | Happy Cow Dairy | Agriculture | ₹40,000,000 | Mumbai, Maharashtra, India | 2018 |
| 2 | MyLoanCare | Credit | ₹65,000,000 | Gurgaon, Haryana, India | 2018 |
| 3 | PayMe India | Financial Services | 2000000 | Noida, Uttar Pradesh, India | 2018 |
| 4 | Eunimart | ECommerce Platforms | — | Hyderabad, Andhra Pradesh, India | 2018 |
Data_2018['Headquarter'] =Data_2018['Headquarter'].apply(str) # To apply string formatting to the whole column
Data_2018['Headquarter'] =Data_2018['Headquarter'].str.split(',').str[0] # To separate the values in the column by commas and select the first value only
Data_2018['Headquarter'] = Data_2018['Headquarter'].replace("'", "", regex=True) # Remove any ' that may be attached to the data
Data_2018.head()
| Company Name | Sector | Amount | Headquarter | Year of funding | |
|---|---|---|---|---|---|
| 0 | TheCollegeFever | Brand Marketing | 250000 | Bangalore | 2018 |
| 1 | Happy Cow Dairy | Agriculture | ₹40,000,000 | Mumbai | 2018 |
| 2 | MyLoanCare | Credit | ₹65,000,000 | Gurgaon | 2018 |
| 3 | PayMe India | Financial Services | 2000000 | Noida | 2018 |
| 4 | Eunimart | ECommerce Platforms | — | Hyderabad | 2018 |
# Cleaning the Amounts column
## Removing the commas and dashes from the Amounts
Data_2018['Amount'] = Data_2018['Amount'].apply(str)
Data_2018['Amount'].replace(",", "", inplace = True, regex=True)
Data_2018['Amount'].replace("—", 0, inplace = True, regex=True)
Data_2018['Amount'].replace("$", "", inplace = True, regex=True)
## Creating temporary columns to help with the conversion of INR to USD
Data_2018['INR Amount'] = Data_2018['Amount'].str.rsplit('₹', n = 2).str[1]
Data_2018['INR Amount'] = Data_2018['INR Amount'].apply(float).fillna(0)
Data_2018['INR Amount'] = Data_2018['INR Amount'].fillna(0)
Data_2018['USD Amount'] = Data_2018['INR Amount'] * 0.0146
Data_2018['USD Amount'] = Data_2018['USD Amount'].replace(0, np.nan)
Data_2018['USD Amount'] = Data_2018['USD Amount'].fillna(Data_2018['Amount'])
Data_2018['USD Amount'] = Data_2018['USD Amount'].replace("$", "", regex=True)
Data_2018['Amount'] = Data_2018['USD Amount']
Data_2018["Amount"] = Data_2018["Amount"].apply(lambda x: float(str(x).replace("$","")))
Data_2018["Amount"] = Data_2018["Amount"].replace(0, np.nan)
Data_2018.head()
| Company Name | Sector | Amount | Headquarter | Year of funding | |
|---|---|---|---|---|---|
| 0 | TheCollegeFever | Brand Marketing | 250000.00 | Bangalore | 2018 |
| 1 | Happy Cow Dairy | Agriculture | 584000.00 | Mumbai | 2018 |
| 2 | MyLoanCare | Credit | 949000.00 | Gurgaon | 2018 |
| 3 | PayMe India | Financial Services | 2000000.00 | Noida | 2018 |
| 4 | Eunimart | ECommerce Platforms | NaN | Hyderabad | 2018 |
Data_2018.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 525 entries, 0 to 525 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Company Name 525 non-null object 1 Sector 525 non-null object 2 Amount 377 non-null float64 3 Headquarter 525 non-null object 4 Year of funding 525 non-null int64 dtypes: float64(1), int64(1), object(3) memory usage: 24.6+ KB
Data_2018.isnull().sum()
Company Name 0 Sector 0 Amount 148 Headquarter 0 Year of funding 0 dtype: int64
Data_2018['Amount'] = Data_2018['Amount'].fillna(0)
Data_2018.isnull().sum()
Company Name 0 Sector 0 Amount 0 Headquarter 0 Year of funding 0 dtype: int64